Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition
نویسندگان
چکیده
Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun AbstractThe increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for present day digital image analysis systems. And when the Digital Library is concerned with Science and Technology documents, it needs to advance the OCR system to bilingual nature as most of them are written in Myanmar in combination with English letters. In this paper a bilingual OCR to simultaneously recognize the printed English and Myanmar texts is proposed including segmentation mechanism for the overlapping nature of Myanmar scripts. The effectiveness of the proposed mechanism is proved with the experimental results of segmentation accuracy rates, comparisons of feature extraction methods and overall accuracy rates.
منابع مشابه
A Comparative Analysis of Classifiers Accuracies for Bilingual Printed Documents (Oriya-English)
Bilingual document recognition has been the subject of intensive research and our focus is on the recognition of an Oriya-English bilingual documents. In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and st...
متن کاملA Script Recognizer Independent Bi-lingual Character Recognition System for Printed English and Kannada Documents
Department of Computer Science Amrita Vishwa Vidyapeetham, Mysore Campus Bogadi, Mysore INDIA _____________________________________________________________________________________ Abstract: Recognition of text document images is the inclination of any optical character recognition systems. This paper aims at extending the functionality of optical character recognition system to recognize more t...
متن کاملCharacter Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents
Nowadays, it is observed that English script has interspersed within the Indian languages. So there is a need for an optical character recognition (OCR) system which can recognize these bilingual documents and store it for future use. Hence, in this paper an OCR system is proposed that can read documents containing Gujarati and English scripts (Only digits). These scripts have many features in ...
متن کاملScript Identification from Bilingual Gujarati-English Documents
In a multi-lingual country like India, in most of the official papers, school text books, magazines, it is observed that English words intersperse within the Indian regional languages. So a bilingual Optical Character Recognition (OCR) system is needed which can recognize these bilingual documents and store it for future use. In this paper authors present an OCR system developed for the script ...
متن کاملA Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images
This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the script...
متن کامل